ML Topics

1. Learning Model (Chapters 2 and 3)

Formal model with definitions
Propositions’ statements
Proof of Corollary 2.3
Definition of PAC learnability with 0-1 loss
Sample complexity for PAC learnability of finite hypotheses classes with 0-1 loss
Definition of agnostic PAC learnability for general loss functions
Definitions related to empirical and true error for general loss functions, etc.

Definition of ε-representative sample (all details)
Lemma 4.2: statement and proof with all details
Definition of uniform convergence property: main idea (details of definition not required)
Corollary 4.4 main idea (details of statement not required)
Corollary 4.6: statement and proof with all details

No Free Lunch (NFL) theorem, NFL and prior knowledge: only main idea
Corollary 5.2 : statement with all details (no proof)
Approximation error, estimation error, complexity and error decomposition: all details

Restrictions, shattering, VC-dimension: definitions in detail
Fundamental Theorems of Statistical Learning and bound on generalization: in detail, as on the slides (no proof)

Linear predictors/models: definitions with all details
Linear regression: definitions, matrix form, derivation best predictor, use of generalized inverse in detail (derivation generalized inverse: not required)
R2: definition and interpretation in detail
Linear classification: perceptron: definitions and algorithm in detail
Proposition on perceptron convergence: only main idea (as in slide “Perceptron: Notes”)
VC-dimension of linear models: in detail
Logistic regression: all details

Validation: all details, including statement of bound on generalization error using validation set (Theorem 11.1 , no proof)
Validation for model selection: all details, including statement of bound on generalization error using validation set (Theorem 11.2 , no proof)
Model-selection curve, train-validation-test split, k-fold cross validation: all details
What if learning fails: main idea

Regularized Loss Minimization: all details
l1 regularization, LASSO: all details
Tikhonov Regularization, Ridge Regression, derivation of optimal solution for Ridge Regression: all details
Subset selection, forward selection, backward selection, without and with validation data: all details

Gradient descent (GD): all details
Stochastic gradient descent (SGD): all details
Hard-SVM optimization problem: all details
Hard-SVM: equivalent formulation, and quadratic formulation: main idea
Definition of support vectors for hard-SVM: all details
Soft-SVM optimization problem, hinge loss: all details
SGD for solving soft-SVM (algorithm): all details
Hard-SVM dual formulation: main idea
SVM for regression: all details only for optimization problem and support vectors definition

Neuron, activation function, network architecture, point of view of one node, hypothesis set, matrix notation: all details
General construction of NN for a given Boolean formula: all details
Expressiveness of NNs: main idea
Sample complexity, runtime of learning NNs: main idea
Forward propagation algorithm: all details
SGD and Backpropagation algorithm: main idea (pseudocode: only main structure)
Regularized NNs: main idea
CNNs: differences with standard NNs, convolution properties, convolutional layer, pooling layer, dropout, early stopping, data augmentation, cross entropy loss function: main idea